The present invention relates to the field of information technology, including, more particularly, to systems and techniques for document processing.
For most organizations, information can be the foundation for competitive differentiation—from faster processing time and reduced operating costs to quicker access to information and ensured compliance. Or, by sheer volume and complexity alone, it can thwart productivity, waste time and resources, and strain the IT infrastructure that supports it.
A key to utilizing information successfully is the ability to efficiently capture and manage large volumes of information from disparate sources. Business critical information arrives in many forms including paper and fax. Transforming the information into intelligent content can feed enterprise applications such as enterprise content management, enterprise resource planning, customer relationship management, and other information systems.
It can be very difficult to group and classify paper documents that have been scanned because of optical character recognition (OCR) errors, differences in text, differences in graphics, noise, stray marks, rotations, skewing, handwriting, and so forth.
Thus, there is a need to provide systems and techniques for automatically grouping and classifying documents.